{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0174eb96",
   "metadata": {},
   "source": [
    "# Using different Embedding Models\n",
    "\n",
    "Ragas allows users to change the default embedding model used in the evaluation task.\n",
    "\n",
    "This guide will show you how to use different embedding models for evaluation in Ragas."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55f0f9b9",
   "metadata": {},
   "source": [
    "## Evaluating with Azure Open AI Embeddings\n",
    "\n",
    "Ragas uses open-ai embeddings by default. In this example we can use Azure Open AI Embeddings from langchain with the embedding model text-embedding-ada-002. We will be using gpt-35-turbo-16k from Azure OpenAI as the llm for evaluation and `AnswerSimilarity` as the metric\n",
    "\n",
    "To start-off, we initialise the gpt-35-turbo-16k from Azure and create a chat_model using langchain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "25c72521-3372-4663-81e4-c159b0edde40",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from langchain_openai.chat_models import AzureChatOpenAI\n",
    "\n",
    "os.environ[\"OPENAI_API_VERSION\"] = \"2023-05-15\"\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"your-openai-key\"\n",
    "\n",
    "azure_model = AzureChatOpenAI(\n",
    "    deployment_name=\"your-deployment-name\",\n",
    "    model=\"gpt-35-turbo-16k\",\n",
    "    azure_endpoint=\"https://your-endpoint.openai.azure.com/\",\n",
    "    openai_api_type=\"azure\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1fdb48b",
   "metadata": {},
   "source": [
    "In order to use the Azure Open AI embedding, we have to instantiate an object of the `AzureOpenAIEmbeddings` class from Langchain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai.embeddings import AzureOpenAIEmbeddings\n",
    "\n",
    "azure_embeddings = AzureOpenAIEmbeddings(\n",
    "    deployment=\"your-deployment-name\",\n",
    "    model=\"text-embedding-ada-002\",\n",
    "    azure_endpoint=\"https://your-endpoint.openai.azure.com/\",\n",
    "    openai_api_type=\"azure\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "c4f0adec",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatasetDict({\n",
       "    baseline: Dataset({\n",
       "        features: ['question', 'ground_truths', 'answer', 'contexts'],\n",
       "        num_rows: 30\n",
       "    })\n",
       "})"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# dataset\n",
    "from datasets import load_dataset\n",
    "\n",
    "amnesty_qa = load_dataset(\"explodinggradients/amnesty_qa\", \"english\")\n",
    "amnesty_qa"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62645da8-6a52-4cbb-bec7-59f7e153cd38",
   "metadata": {},
   "source": [
    "To use the AzureOpenAIEmbeddings with the `AnswerSimilarity` metric, we first import the answer_similarity metric and pass in the embeddings in the evaluate function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "307321ed",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Evaluating: 100%|██████████| 5/5 [00:19<00:00,  3.86s/it]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'answer_similarity': 0.8494}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from ragas.metrics import answer_similarity\n",
    "from ragas import evaluate\n",
    "\n",
    "result = evaluate(\n",
    "    amnesty_qa[\"eval\"].select(range(5)),  # showing only 5 for demonstration\n",
    "    metrics=[answer_similarity],\n",
    "    llm=azure_model,\n",
    "    embeddings=azure_embeddings,\n",
    ")\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d5f48c16",
   "metadata": {},
   "source": [
    "That's it! answer_similarity will now be using AzureOpenAIEmbeddings under the hood for evaluations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f490031e-fb73-4170-8762-61cadb4031e6",
   "metadata": {},
   "source": [
    "## Evaluating with FastEmbed Embeddings\n",
    "\n",
    "`FastEmbed` is a Python library built for embedding generation and has support for popular text models. FastEmbed models can be used by instantiating an object of the FastEmbedEmbeddings class from Langchain. More information regarding FastEmbed and supported models can be found [here](https://github.com/qdrant/fastembed)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "85e313f2-e45c-4551-ab20-4e526e098740",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 252M/252M [00:09<00:00, 25.7MiB/s] \n"
     ]
    }
   ],
   "source": [
    "from langchain_community.embeddings.fastembed import FastEmbedEmbeddings\n",
    "\n",
    "fast_embeddings = FastEmbedEmbeddings(model_name=\"BAAI/bge-base-en\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9ddf74a-9830-4e1a-a4dd-7e5ec17a71e4",
   "metadata": {},
   "source": [
    "Now lets change the embedding model used for AnswerSimilarity by passing the embedding as the `FastEmbedEmbeddings` object that we created in the evaluation function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "20882d05-1b54-4d17-88a0-f7ada2d6a576",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Evaluating: 100%|██████████| 5/5 [00:21<00:00,  4.24s/it]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'answer_similarity': 0.8494}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result2 = evaluate(\n",
    "    amnesty_qa[\"eval\"].select(range(5)),  # showing only 5 for demonstration\n",
    "    metrics=[answer_similarity],\n",
    "    llm=azure_model,\n",
    "    embeddings=fast_embeddings,\n",
    ")\n",
    "\n",
    "result2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluating with HuggingFace Embeddings\n",
    "\n",
    "Ragas has support for using embedding models using HuggingFace. Using the `HuggingfaceEmbeddings` class in Ragas, the embedding models supported by HuggingFace can directly be used for the evaluation task."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# To use embedding models from HuggingFace, you need to install the following\n",
    "%pip install sentence-transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ragas.embeddings import HuggingfaceEmbeddings\n",
    "\n",
    "hf_embeddings = HuggingfaceEmbeddings(model_name=\"BAAI/bge-small-en\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we follow the same steps as above to use the HuggingFace Embeddings in the evaluate function with the metrics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Evaluating:   0%|          | 0/5 [00:00<?, ?it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Evaluating: 100%|██████████| 5/5 [00:21<00:00,  4.24s/it]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'answer_similarity': 0.8494}"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result3 = evaluate(\n",
    "    amnesty_qa[\"eval\"].select(range(5)),  # showing only 5 for demonstration\n",
    "    metrics=[answer_similarity],\n",
    "    llm=azure_model,\n",
    "    embeddings=hf_embeddings,\n",
    ")\n",
    "\n",
    "result3"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
